Adaptive Step-Size for Policy Gradient Methods
نویسندگان
چکیده
In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement–learning field. In particular, they have been largely employed in motor control and robotic applications, thanks to their ability to cope with continuous state and action domains and partial observable problems. Policy gradient researches have been mainly focused on the identification of effective gradient directions and the proposal of efficient estimation algorithms. Nonetheless, the performance of policy gradient methods is determined not only by the gradient direction, since convergence properties are strongly influenced by the choice of the step size: small values imply slow convergence rate, while large values may lead to oscillations or even divergence of the policy parameters. Step–size value is usually chosen by hand tuning and still little attention has been paid to its automatic selection. In this paper, we propose to determine the learning rate by maximizing a lower bound to the expected performance gain. Focusing on Gaussian policies, we derive a lower bound that is second–order polynomial of the step size, and we show how a simplified version of such lower bound can be maximized when the gradient is estimated from trajectory samples. The properties of the proposed approach are empirically evaluated in a linear–quadratic regulator problem.
منابع مشابه
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that ...
متن کاملAdaptive Batch Size for Safe Policy Gradients
PROBLEM • Monotonically improve a parametric gaussian policy πθ in a continuous MDP, avoiding unsafe oscillations in the expected performance J(θ). • Episodic Policy Gradient: – estimate ∇̂θJ(θ) from a batch of N sample trajectories. – θ′ ← θ+Λ∇̂θJ(θ) • Tune step size α and batch size N to limit oscillations. Not trivial: – Λ: trade-off with speed of convergence← adaptive methods. – N : trade-off...
متن کاملGradient Methods with Adaptive Step-Sizes
Motivated by the superlinear behavior of the Barzilai-Borwein (BB) method for two-dimensional quadratics, we propose two gradient methods which adaptively choose a small step-size or a large step-size at each iteration. The small step-size is primarily used to induce a favorable descent direction for the next iteration, while the large step-size is primarily used to produce a sufficient reducti...
متن کاملOff-policy learning based on weighted importance sampling with linear computational complexity
Importance sampling is an essential component of model-free off-policy learning algorithms. Weighted importance sampling (WIS) is generally considered superior to ordinary importance sampling but, when combined with function approximation, it has hitherto required computational complexity that is O(n2) or more in the number of features. In this paper we introduce new off-policy learning algorit...
متن کاملA stochastic gradient adaptive filter with gradient adaptive step size
This paper presents an adaptive step-size gradient adaptive filter. The step size of the adaptive filter is changed according to a gradient descent algorithm designed to reduce the squared estimation error during each iteration. An approximate analysis of the performance of the adaptive filter when its inputs are zero mean, white, and Gaussian and the set of optimal coefficients are time varyin...
متن کامل